Implicit Ligand Theory: Rigorous Binding Free Energies and Thermodynamic 

Expectations from Molecular Docking 



(N 

o 

(N 

< 
(N 



c3 



o 
o 



> 
in 
oo 
oo 

00 

o 

(N 



X 



David D. L. Minh* 

Department of Chemistry, Duke University, Durham NC 27708 USA 
(Dated: August 27, 2012) 

A rigorous formalism for estimating noncovalent binding free energies and thermodynamic ex- 
pectations from calculations in which receptor configurations are sampled independently from the 
ligand is derived. Due to this separation, receptor configurations only need to be sampled once, 
facilitating the use of binding free energy calculations in virtual screening. Demonstrative calcu- 
lations on a host-guest system yield good agreement with previous free energy calculations and 
isothermal titration calorimetry measurements. Implicit ligand theory provides guidance on how 
to improve existing molecular docking algorithms and insight into the concepts of induced fit and 
conformational selection in noncovalent macromolecular recognition. 



INTRODUCTION 

The goal of molecular docking is to predict the most 
stable configuration of a noncovalent complex between 
a ligand and receptor. Based on this configuration, the 
complex is assigned a score which may be used to ap- 
proximately rank the binding affinity of one ligand to the 
receptor versus another. Molecular docking has many po- 
tential applications, and has been most prominently ap- 
plied to the virtual screening [1, 2] of chemical libraries 
to aid the development of pharmaceuticals. 

Given the three-dimensional structure of a protein re- 
ceptor, docking algorithms have proven reasonably adept 
at sampling stable conformations of small organic ligands 
in the complex. Unfortunately, current scoring functions 
perform poorly at predicting binding free energies [3-5]. 
Hence, docking is typically used to filter a large library 
of potential ligands to a smaller binder-enriched library 
that may be pursued experimentally or by more accu- 
rate and expensive computational methods [6-10]. Even 
in this capacity, however, scoring functions are inconsis- 
tent, frequently presenting false positives (ligands pre- 
dicted to bind but actually have weak or no affinity) and 
false negatives (ligands predicted not to bind but actually 
have significant affinity). For example, docking programs 
often have difficulty distinguishing binding compounds 
from decoys in which the chemical connectivity has been 
randomized [3, 11]. Improved scoring functions would in- 
crease the capability to discern binders from non-binders. 

The improvement of scoring functions, however, has 
been hindered by the lack of a rigorous formalism for 
obtaining binding free energies from molecular docking. 
While molecular docking calculations are usually per- 
formed with a rigid receptor, existing formalisms for 
binding free energies require a flexible receptor. Here, I 
derive a formalism, implicit ligand theory, for estimating 
binding free energies and thermodynamic expectations 
based on docking ligands to rigid receptor structures. I 
also describe practical aspects of statistical estimation, 
present example calculations, and discuss how physics- 



based (opposed to empirical or knowledge-based) docking 
algorithms (see [12]) may be modified to exploit it. Be- 
yond molecular docking, implicit ligand theory provides 
insight into the concepts of induced fit and conforma- 
tional selection in noncovalent macromolecular recogni- 
tion. 



THEORY 

The standard binding free energy, the free energy of a 
noncovalent association between a receptor R and ligand 
L to form a complex RL, R + L ^ RL, is, 



AC . = W^), 



id 



where j3 = (fcgT) -1 is the inverse of Boltzmann's con- 
stant, ks, times the temperature in Kelvin, T, C° is 
the standard concentration (typically 1 M), and Cx is 
the equilibrium concentration of species X G {R,L, RL} 
[13]. 

Statistical thermodynamics relates the standard bind- 
ing free energy to a ratio of configurational partition func- 
tions [14], 
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in which symmetry numbers and a small pressure- volume 
term have been omitted from Eq. (2). Zrl^n and Zy,N 
are configurational partition functions of the complex and 
of the species Y G {R, L}, respectively, in N molecules of 
solvent. The potential energy U(rx,rs) depends on rx, 
the internal coordinates of the receptor, ligand, or both 
in complex (the external degrees of freedom have been 
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analytically integrated), and rg, the coordinates of N 
molecules of solvent. The complex coordinates vrl may 
be decomposed into the internal coordinates of the recep- 
tor, r R , and of the ligand, r^, and six degrees of freedom 
describing their relative translation and rotation, £l . For 
simplicity, Jacobians for the transformation from Carte- 
sian coordinates to a system with separated internal and 
external degrees of freedom are not shown in Eqs. (3) 
and (4). In Zrl,n, the indicator function 1^ = 
takes values between and 1 and determines whether 
the receptor and ligand are complexed or not. For tight- 
binding complexes, the binding free energy is insensitive 
to the precise definition of 1^ [14]. 



Implicit Solvent Theory 

The configurational integrals in Eq. (2) may be ex- 
pressed in a formally equivalent but simpler form us- 
ing implicit solvent theory [14]. In implicit solvent the- 
ory, the interaction energy is defined as ip(rx ,rs) = 
U(rx,rs) — U(rx) — U(rs), where U(rx) is the poten- 
tial energy of species X by itself and U(rs) the potential 
energy of the solvent by itself. By integrating the config- 
urational partition functions over rs, we may define the 
ratios, 
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is a potential of mean force that can be interpreted 
as the constant-pressure reversible work of transferring 
the species X from the gas phase into the solvent. In 
biomolccular modeling, W(rx) is frequently estimated 
as the sum of an electrostatic term from the Poisson- 
Boltzmann equation [15] (or the Generalized Born ap- 
proximation [16]), and a non-electrostatic term, which 
to a first approximation is proportional to the molecular 
surface area. 

In terms of implicit solvent configurational integrals, 
the standard binding free energy is, 
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As most implicit solvent models fail to account for spe- 
cific interactions, such as hydrogen bonding, that can 
have important structural and energetic consequences, 
binding free energy calculations in implicit solvent are 
generally expected to be less accurate than those in ex- 
plicit solvent [17]. Nevertheless, binding free energy 
calculations in implicit solvent have yielded promising 
agreement with experimental results (e.g. [18-21]). 



Implicit Ligand Theory 

The development of implicit ligand theory is very sim- 
ilar to that of implicit solvent theory. It involves defining 
the effective potential as,U(rx) = U(rx)+W(rx), the ef- 
fective interaction energy as ^(r^i) = U(r R L) — W(t\r) — 
U(r L ), and, 
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which is a potential of mean force that will subsequently 
be referred to as the binding PMF. Throughout this pa- 
per, angled brackets (■■■) x will be used to denote an 
ensemble average over the coordinates r listed in the 
superscript with respect to the density proportional to 
Qx,..., where X describes the coordinates in the effective 
potential U(rx), and ... are labels. Here, 9l.7(^l,^l) = 
l£e~P u ( rL \ Within angled brackets, I will use a short- 
hand notation in which functions implicitly depend on 
coordinates, e.g. 3/ = ^(trl)- 

In terms of the binding PMF, Eq. (9) may be written 
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where f2 = J I^d^L (which may be analytically tractable) 
is the binding site volume, AG^ = — (3~ x In (^r) is the 
free energy of confining the ligand external degrees of 
freedom to the binding site, and qR(r R ) = e~^ utyTR ^ . Eqs. 
(10) and (11) are the central theoretical results of this 
paper. 

Implicit ligand theory provides a rigorous framework 
for binding free energies that separates the sampling of 
receptor and ligand configurations. In Eq. (11), the re- 
ceptor probability density is independent of any ligand 
configuration. Likewise, the probability density of ligand 
internal coordinates in Eq. (10) is independent from the 
receptor configuration. In practice, however, sampling 
from this ligand distribution may lead to slow conver- 
gence (this point will later be discussed in greater de- 
tail). The primary benefit of implicit ligand theory is 
that the computationally expensive step of sampling re- 
ceptor configurations only needs to be performed once. 
Predicting binding free energies for a chemical library is 
then limited by the much faster process of sampling lig- 
and conformations. 
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Thermodynamic Expectations 



Receptor Configurations 



In addition to estimating the binding free energy, im- 
plicit ligand theory may also be used to estimate expected 
values of observables in the bound ensemble. Observables 
may include, for example, the mean potential energy, in- 
teraction energy, or distance between a ligand and recep- 
tor atom. Towards this end, it is useful to define a rigid- 
receptor expectation of an observable 0(r R i,), weighted 
by the interaction energy, 
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If the observable is solely a function of the receptor con- 
figuration, then 0(?\r) reduces to 0(r R )e~^ B ^ TR \ 

In terms of Eqs. (10) and (12), the expectation 
of 0(t R l) with respect to the density proportional to 
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Eqs. (12) and (13) significantly generalize implicit lig- 
and sampling [22] , a method to estimate the potential of 
mean force for the ligand center of mass. The results of 
Cohen et al. [22] may be obtained by choosing the ob- 
servable as a Dirac delta function for the ligand center 
of mass, taking a natural logarithm, and multiplying by 
Cohen et al. [22] applied implicit ligand sampling 
to study gas migration pathways in myoglobin, but the 
possibility of estimating other observables and binding 
free energies has not been previously recognized. 



ESTIMATION 

Applying implicit ligand theory to predicting binding 
free energies involves three steps: 

1. Sampling receptor configurations. 

2. Estimating the binding PMF, B(r R ), for each re- 
ceptor configuration. 

3. Estimating AG° from B(r R ) estimates. 

In this section, I present several ways, roughly in order 
of increasing complexity, that these steps may be accom- 
plished. A variant of one approach will be demonstrated 
later in the paper. 



Receptor configurations can be drawn from q R (r R ), 
any (possibly unnormalized) distribution q R , w {r R ) on 
the same support as q R (r R ) and for which w(r R ) = 
q R (r R )/q R ,w(r R ) may be calculated, or from multiple 
distributions satisfying these conditions. Regardless of 
the sampling method, however, convergence of free en- 
ergy estimates requires representative sampling of both 
the bound and unbound receptor configuration space. A 
particularly straightforward protocol is to sample from 
the distribution proportional to q R (r R ) = e~^ u( ^ rRI ] one 
conducts a molecular dynamics (MD) simulation in the 
implicit solvent used for W(r R ), collecting snapshots at 
evenly spaced intervals that are longer than the statisti- 
cal correlation time. This protocol may be satisfactory 
if receptor fluctuations are minimal and the ligand does 
not significantly perturb the receptor configurational en- 
semble. 



For a receptor that undergoes larger structural fluctu- 
ations, sampling from multiple energetic minima may be 
facilitated by applying an external biasing potential (e.g. 
a harmonic bias) on one or more order parameters. If it is 
known that a ligand significantly perturbs the receptor 
configurational ensemble, it can be useful to introduce 
multiple alchemical intermediates into a simulation. Al- 
chemical calculations may involve a coupling parameter 
A, defined such that the two groups (e.g. the receptor and 
ligand) are non-interacting at A = and fully interacting 
with A = 1. Simulations are conducted with A at these 
end points and at multiple values in between. Sampling 
in each stage may be enhanced by Hamiltonian replica ex- 
change (e.g. Gallicchio et al. [21], Jiang et al. [23], Gallic- 
chio and Levy [24]), which entails stochastically swapping 
the coordinates of different simulations with a probabil- 
ity that preserves the Boltzmann distribution. Recep- 
tor configurations obtained through a flexible-receptor 
Hamiltonian replica exchange with a single ligand may 
subsequently be used for implicit ligand free energy cal- 
culations with other ligands in the chemical library. 

As a caveat, implicit ligand theory does not provide 
a formal justification for docking to multiple experimen- 
tally determined structures (e.g. [25]) or any other set 
of structures in which w(r R ) is unknown (e.g. homol- 
ogy modeling or flexible docking) . One potential way to 
use information about multiple structures is to conduct 
multiple MD simulations with external potentials biased 
towards one or more of the structures. To facilitate later 
analysis, the external potentials should be set up to pro- 
mote overlap in the configuration space of different sim- 
ulations. 
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Estimating a Binding PMF 

The binding PMF B(r R ) may be expressed in terms of 
a ratio of partition functions, 



B{r R ) = -/TMn 
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which clarifies that B(r R ) is a special type of free en- 
ergy difference in which the receptor configuration r R 
is rigid. Thus, B(r R ) may be calculated using any one 
of many available methods to estimate free energy differ- 
ences [26], including free energy perturbation (FEP) [27], 
thermodynamic integration (TI) [28], and the Bennett 
Acceptance Ratio (BAR) [29]. While formally equiva- 
lent, free energy methods can have dramatically different 
convergence properties. 

Based on the form of Eq. (10), the most straightfor- 
ward estimation protocol is FEP. One can, for example, 
draw ligand configurations from the distribution propor- 
tional to qij( r L i £l) = &~^ u ^ Th ' by conducting a MD sim- 
ulation of the ligand in the appropriate implicit solvent 
and collecting snapshots at sufficiently long intervals. Be- 
cause ql is independent of the external degrees of 
freedom sampled from the simulation may be replaced 
by a new £x sampled from the distribution proportional 
to q^j — 1^. The expectation in Eq. (11) may then be 
estimated by the sample mean, 



B(r R ) 
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where r R L, n is the n* /l of N samples of the complex. 
Throughout this paper, A will denote a statistical es- 
timator - an equation used to calculate a quantity based 
on sampled data. 

In exponential averages such as Eq. (15), a small sub- 
set of samples may contribute a large portion of the sum. 
The limiting case of an individual important sample in- 
spires the severe dominant state approximation, in which 
a single value of ^{t R l) is used to estimate B{r R ). Ex- 
ponential averages may also be estimated via a cumulant 
expansion [30], here shown for Eq. (10) to the fourth 
order, 
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where = $(r RL ) - j . Each expectation in 
the cumulant expansion may be estimated by the sample 
mean. 

While formally correct, this approach to ligand sam- 
pling can converge slowly if most ligand configurations 
placed in the binding site have overlapping atoms and 
high values of ^{t R l)- One potential solution to this 



problem is to sample the external degrees of freedom from 
a distribution biased towards energetically favorable ori- 
entations by a confining potential U c (£l). Multiplying 
and dividing Eq. (10) by Q c = J I^-^-^d^ and the 
integrand in the numerator by e~' 3c/<: ^ L ^ leads to, 

where q LJc = I^r fi ^ r ^ +U ^ L ^. Good choices for 
U c (£,l), which may be ascertained from existing molecu- 
lar docking algorithms (as will be discussed later in the 
paper), will favor the sampling of poses with low ^(r R L). 

Alternatively, the binding PMF may be calculated us- 
ing the inverse form of Eq. 10, 



B(r R ) =0- 1 ki 



J I^e-W^^) dr L d£ L J 

Ligand configurations from the distribution proportional 
to q R L,i{ r L: = I^e^^ u ^ rRL ^ may be sampled, for ex- 
ample, from an implicit-solvent MD simulation in which 
the receptor is held rigid and the ligand is allowed to 
move, and the expectation estimated using the sample 
mean estimator. 

This straightforward procedure is also problematic be- 
cause of the rarity of sampling configurations in which the 
ligand is separated from the receptor or in which they 
overlap. While these configurations are insignificant in 
the conformational ensemble in which receptor and ligand 
are fully interacting, they are relevant to the ensemble of 
noninteracting ligand and receptor, and the convergence 
of free energy differences requires phase space overlap 
between adjacent thermodynamic states [26]. The phase 
space overlap problem may also be alleviated by calcu- 
lating the free energy difference with a reference state in 
which the external degrees of freedom are confined, 



B(r R ) = (T 1 In (e?V- u A rL * L - /T 1 In ^. (19) 

The binding PMF may be estimated from the same sam- 
ples as with Eq. (18), and will be more accurate the more 
closely e~^ Uc ^ L ' resembles the distribution of £l in the 
complex. 

As discussed, phase space overlap problems are often 
resolved by introducing multiple alchemical stages into 
a calculation, and sampling may be enhanced by Hamil- 
tonian replica exchange. With multiple stages, the total 
free energy difference between states with A = and 
A = 1 is the sum of free energy differences between adja- 
cent stages, each of which may be estimated by FEP [27], 
TI [28], or BAR [29]. Alternatively, the total free energy 
difference may be estimated by the multistate Bennett 
Acceptance Ratio (MBAR) [31]. 
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Estimating the Binding Free Energy 



DEMONSTRATION 



Once B(tr) is evaluated for each receptor configura- 
tion, the binding free energy may be calculated by esti- 
mating an ensemble average. The appropriate method 
for estimating AG° depends on how the receptor con- 
figurations rfj are sampled. If they are drawn from the 
distribution qti(rR), then the expectation in Eq. (11) 
may be estimated by the sample mean, 



AG° 
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,-/3B(r B ,„) 
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in which B(r^ n ) is the estimated binding PMF for the 
n*" of N receptor configurations. Because the implicit- 
ligand expression for the binding free energy, Eq. (11), 
has the same form as Eq. (10), the dominant state ap- 
proximation and cumulant expansion may also be ap- 
plied. 

If receptor configurations are drawn from a biased dis- 
tribution, the importance sampling identity, 



J 0{r)qx {r)dr 

J q T {r)dr 
J 0(r)w(r)qs (r)dr 

J w(r)q s (r)dr 



( w )s 



(21) 



may be applied. In this generic expression, w(r) = 
qT{r)/ qs{r) is a ratio of unnormalized densities qr(r) for 
the target distribution and qsif) for the sampling dis- 
tribution. Using the sample mean estimator and impor- 
tance sampling identity for the expectation in Eq. (11) 
leads to, 

AG = -/T 1 In Ln=i y nJe + AG 5 (22) 

En=l w ( r R,n) 

If receptor configurations are drawn from multiple biased 
distributions, then the expectation may be estimated us- 
ing MBAR [31]. 



Thermodynamic Expectations 

Thermodynamic expectations may be estimated from 
the same data as the binding free energy. The appropri- 
ate estimator for 0(r#) will depend on how the ligand 
configurations were sampled. Once 0(t\r) is estimated 
for every sampled receptor configuration, the appropri- 
ate estimator for the expectation in Eq. (13) similarly de- 
pends on how the receptor configurations were sampled. 
In the simplest case for O(r-fl), if ligand configurations 
are sampled from q^j, then 0(t\r) may be estimated by 
a sample mean. In other cases, O(r^) and {0) r ^ 1 may 
be estimated using importance sampling, MBAR [31], or 
a combination thereof. 



As a demonstration, implicit ligand theory calcula- 
tions were performed to estimate the standard binding 
free energy of various ligands to Cucurbit [7] uril (CB[7]) 
in water. The binding of CB[7] to a number of fer- 
rocenes, adamantanes, and bicyclooctanes has been well- 
characterized by both isothermal calorimctry and second- 
generation mining minima (M2) [18, 19] free energy cal- 
culations [32, 33]. Receptor configurations were sampled 
by molecular dynamics, binding PMFs estimated with a 
multi-stage alchemical calculation and MBAR [31], and 
the binding free energy calculated using Eq. (20) or the 
dominant state approximation. 



Methods 

Molecular dynamics simulations at 300 K were per- 
formed with a slightly modified [34] compilation of 
NAMD [35] version 2.9. When appropriate, CB[7] was 
fixed using the fixedAtoms parameter. The "commer- 
cial" force field parameters and topologies from Moghad- 
dam et al. [33] were used for both CB[7] and its ligands. 
To match the force field from Moghaddam et al. [33] as 
closely as possible, 1-4 electrostatics were scaled by 0.5 
and the nonbonded cutoff was set to 999 A, which ef- 
fectively turns off cutoffs. Water was represented with 
the Generalized Born Surface Area (GBSA) implicit sol- 
vent model without ions and a surface tension of 0.006 
kcal/mol/A 2 . The receptor dielectric was 1.0 and solvent 
dielectric was 78.5. A time step of 1 fs (using a 2 fs time 
step with fixed atoms led to unstable trajectories) was 
used with Langcvin dynamics. 

CB[7] was minimized for 2500 steps and thcrmalizcd 
by increasing the temperature by 10 K and reinitializ- 
ing velocities every 100 steps from to 300 K. Receptor 
snapshots were saved every 0.1 ns from a trajectory of 10 
ns. 

Binding PMFs for every ligand in Moghaddam et al. 
[33] with the minimized CB[7] structure (15 repetitions 
each) and 100 receptor simulation snapshots (1 repeti- 
tion each) were estimated using Hamiltonian replica ex- 
change, which can simultaneously dock a ligand and com- 
pute its binding free energy [21, 24]. The implementation 
is similar to that from Gallicchio and Levy [24], except 
that the receptor configuration is fixed. A reservoir of lig- 
and configurations [24] was generated by simulating the 
ligand for up to 10 ns and saving snapshots every 10 ps. 
Simulations of the complex in which A controls the extent 
of interaction between the ligand and receptor were run 
with A e {0,10- 5 ,10- 4 ,10- 3 ,10- 2 , 0.1,0.2,0.3,0.4,0.5, 
0.6,0.7,0.8,0.9,0.95,1.0}. As implemented in NAMD, 
intermediate values of A used a soft-core potential with 
a van der Waals shift coefficient of 5. Electrostatic inter- 
actions were turned on when A = 0.5. Using the colvars 
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module, a fiat-bottom harmonic potential with a spring 
constant of 10 kcal moU 1 A -1 and starting at 0.75 A was 
used to restrain the center-of-mass distance between the 
ligand core (heavy atoms except for the R groups in 
Moghaddam et al. [33]) and the receptor heavy atoms. 
This potential keeps the ligand within the binding site 
when interactions are turned off. The binding site vol- 
ume, Q = J I^d^L, is approximated as 4/37r(0.75 3 )(87r 2 ). 
Because NAMD does not allow the simultaneous use of 
alchemical decoupling and implicit solvent, simulations 
were conducted in vacuum. 

The replica exchange simulation was initiated by tak- 
ing a random ligand configuration, applying a random 
rotation, and randomly placing it within the binding site. 
This initial configuration was minimized and thermalized 
with the same protocol as with CB[7], except that it was 
done in vacuum. The thermalized structure was used to 
start each replica. Occasionally, the random placement 
of the ligand led to high forces that caused the simula- 
tions to crash; in this case, the simulation was restarted 
with a different random initial configuration. 

After every 5 ps of simulation for every value of A, 1000 
replica exchanges were attempted between each pair of 
adjacent A windows. After each set of replica exchange 
attempts, the ligand configuration for A — was replaced 
with a random ligand configuration from the reservoir, 
randomly rotated, and placed in the binding site. (This 
type of reservoir swap satisfies detailed balance.) The 
simulation was conducted for 25 cycles, saving snapshots 
every 0.5 ps, for a total of 2 ns of simulation for each 
binding PMF. The docking and equilibration period, de- 
fined as the time before the potential energy of the fully 
coupled state is within 20 k R T of its energy for the final 
snapshot, was ignored in subsequent analysis. 

Because alchemical coupling calculations were per- 
formed in vacuum, binding PMFs were estimated based 
on a decomposition of B(r R ), 

B(r R ) = B cpl + B RL -B L - AU(r R ) (23) 
J Ize-W^t)+u(rn)]dr L d£ L J 

f It:e-P u ( r R^dr L d£ L J 

m e -^U(r L ) e -0U(r L ) drLd ^ \ 

J I^e-^^^dTLd^L ) ' 

B cp i is the free energy of turning on the interactions be- 
tween the ligand and the rigid receptor in vacuum. B R l , 
£?l, and AU(r R ) are free energies of transferring the com- 
plex, ligand, and receptor, respectively, from vacuum to 
the target state (in implicit solvent). They are based on 
AU(rx) = Ut{tx) — U(rx), the potential energy differ- 
ence between rx in the target state versus the state from 
which configurations were sampled (in vacuum). B cp i 




was estimated by applying MBAR [31] to snapshots from 
every 0.5 ps of simulation, and B R l and Bl by single- 
step FEP (evaluating transfer free energies by MBAR 
would require calculating target-state potential energies 
for every snapshot using computationally expensive force 
fields). 

This decomposition makes it straightforward to eval- 
uate B(r R ) for a variety of force fields using the same 
configurational samples. In this work, four are compared: 

1. NAMD: the total potential energy from using 
GBSA in NAMD [35]; 

2. M2: the total potential energy from using the 
GBSA model in the M2 program [18, 19]; 

3. PB: Poisson-Boltzmann electrostatic solvation free 
energies from UHBD [36] and bond, angle, dihedral, 
coulomb, and van der Waals energies from the M2 
program [18, 19]; 

4. PBSA: Poisson-Boltzmann electrostatic solvation 
free energies from UHBD [36] and bond, angle, di- 
hedral, coulomb, van der Waals, and nonpolar sur- 
face area energies from M2 [18, 19], the combination 
used in Moghaddam et al. [33]. 

During this step, the NAMD, M2, and UHBD programs 
are used strictly for single-point energy evalulations, not 
for minimization or dynamics. Poisson-Boltzmann ener- 
gies were calculated with a grid spacing of 0.18 A with 
dimensions such that the maximum dimensions of the 
molecule are 0.7 (or less) of the final grid [33]. For 
comparison, binding PMFs were also calculated from the 
dominant state approximation with PBSA energies, us- 
ing the lowest value of ^{trl) observed in the simulations 
with A = or A = 1. 

Because receptor configurations were sampled from a 
simulation in GBSA implicit solvent, binding free ener- 
gies were estimated by using Eq. (22). Binding free 
energies were also estimated with the dominant state ap- 
proximation: using the lowest observed value of B(r R ) 
to estimate — In (e~' 3B )^ H . 

To demonstrate the calculation of thermodynamic ex- 
pectations and for comparison with results from Moghad- 
dam et al. [33], the mean values of six PBSA energies - 
van der Waals, coulomb, electrostatic solvation, valence 
(bond + angle + dihedral), nonpolar solvation, and total 
- were estimated for the complex, the receptor, and the 
ligand. Mean PBSA energies for the ligand and recep- 
tor were estimated by applying the importance sampling 
identity to the ligand from the non-interacting system in 
vacuum and to the receptor from the GBSA simulation, 
respectively. Occasionally, energies in the ligand trajec- 
tory briefly spiked to very high values. In estimating the 
mean PBSA energies, these spikes were filtered out by 
removing data points in which the total PBSA energy is 
at least 100 k R T larger than the PBSA energy of the final 
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snapshot. As the spikes were likely caused by the finite 
molecular dynamics time step, they would probably be 
avoided by using a propagator the exactly preserves the 
Boltzmann distribution, e.g. Hybrid Monte Carlo [37]. 

Towards estimating the mean PBSA energies of the 
complex, rigid-receptor expectations were estimated 
by applying MBAR [31] to snapshots from the non- 
interacting and fully interacting states, 



J2n=l w ( r RL,n)0(r RL ^ n ) 



w(r RL ) 



p -P(U P bsa(Rl)-U (Rl)) 



(24) 



Hx e -P{Uy{r RL )-B cp i-U (r RL ) ' 



where Uo(rm,) and Ui(rm,) are the potential energies of 
the non-interacting and fully interacting complexes, re- 
spectively, Upbsa(tl) is the PBSA energy of only the 
ligand, and rnL,n is the n th of N snapshots of either the 
non-interacting (Nq snapshots) or fully interacting com- 
plex (TVi snapshots). B cp i was estimated by using MBAR 
[31] with all replicas. While it would be possible to esti- 
mate the mean PBSA energies using all snapshots from 
all replicas, this was avoided because of the computa- 
tional expense of Poisson-Boltzmann calculations, which 
can take over a minute per snapshot. After obtaining 
0(r^{), the importance sampling identity, Eq. (21), was 
used to estimate the expectations in Eq. (13). To en- 
sure consistency of the estimator - an estimate of a con- 
stant yields the same constant - 0(t\r) was calculated for 

O = 1, in which case ^O(rR)^ = (e 

timate of 
(13). 



[e-Pzy*. This es- 



-f3B\ r R 
>R 



was used in the denominator of Eq. 
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FIG. 1. The mean and standard deviation of 15 indepen- 
dent estimates of B(rR), B cp i, Brl, Bl, and mm{ty(rRL)} 
(kcal/mol) based on PBSA energies as a function of total MD 
simulation time for the ligand B02. Analogous plots for the 
other ligands in this study are available as Fig. SI in the 
Supplemental Material. 



Results 

Highlighting the importance of an accurate molecular 
mechanics model, binding PMF estimates are strongly 
dependent on the force field, as shown in Table I. For the 
large and highly charged bicyclooctane Bll, switching 
the force field causes the binding PMF to change nearly 
40 kcal/mol! With increasing magnitude of charge, larger 
coulomb energies lead to larger values of B cp i and larger 
electrostatic solvation free energies increase the magni- 
tude of .Brl, Bl, and AU{rn) (for estimates of B cp i, 
Bjil, and Bl, see Table SI of the Supplemental Mate- 
rial. Thus, estimating the binding PMF with Eq. (23) 
entails the difficult task of computing a relatively small 
difference between large values. The importance of the 
force field has also been noted for M2 calculations [18, 19]. 
An alternate implementation, e.g. conducting replica ex- 
change within implicit solvent rather than vacuum, may 
not require the implicit solvent model to be as accurate. 

With 2 ns of total simulation for all replicas, the stan- 
dard deviation of binding PMF estimates ranges from 



0.12 to 1.63 kcal/mol (Table I), with most estimates on 
the lower range of imprecision. For all of the compo- 
nents of Eq. (23), the mean estimate does not appear 
to shift after about 0.75 ns, and additional sampling re- 
duces the standard deviation of the estimate (see Fig. 1 
and Fig. SI in the Supplemental Material. There is no 
unique component that limits the convergence of i?(r^); 
the slowest converging component varies from ligand to 
ligand. The binding PMF estimate B(rn) and the min- 
imal interaction energy min {^(trl)} converge at about 
the same rate, suggesting that the limiting factor for con- 
vergence is finding a configuration with the lowest inter- 
action energy This interpretation is corroborated by the 
fact that largest ligands with the most rotatable bonds 
(see Moghaddam et al. [33] for structures) also have the 
most variance in B{tr), as the flexibility increases the 
challenge of finding configurations with low ^(rjn,). 

The accuracy of binding free energy estimates was 
assessed with the correlation coefficient and root mean 



8 



square error, 



RMSE(ml,m2) 



i 



Lml 



AG?. m2 ) 2 (25) 



1=1 



between methods ml and m2, where AG° m is the bind- 
ing free energy estimate for ligand I of L ligands using 
method m (Tables I and II). 

Binding free energy estimates based on the binding 
PMF for a minimized receptor structure suffices to pro- 
vide high correlation with experiment (R 2 = 0.884 for 
NAMD) and M2 free energy calculations (R 2 = 0.827 
for NAMD) (see Table I). Surprisingly, binding free en- 
ergies from NAMD GBSA calculations are more highly 
correlated to these benchmarks than AG° from PBSA 
calculations. Ironically, the high correlation may be ex- 
plained by inaccurately large binding PMF values result- 
ing from highly charged ligands, as the molecules in this 
set with the strongest charges also tend to have stronger 
binding affinities. Although the correlation coefficient is 
high, the RMSE is also considerable, over 10 kcal/mol. 
Similar performance (R 2 and RMSE) is observed by us- 
ing the dominant state approximation with PBSA cal- 
culations. In contrast, using Eq. (23) with PBSA leads 
to less correlated (lower R 2 ) but more accurate (lower 
RMSE) estimates of the binding free energy. 

Even for this simple system, binding free energy esti- 
mates are substantially improved by using multiple recep- 
tor structures (Table II). With binding PMFs from PBSA 
energies for 100 receptor structures, there is both higher 
correlation and lower RMSE with respect to experiment 



(R 



Exp 



0.704, RMSEb™ 



spect to M2 free energy calculations (R% Uso , 



4.5) and especially with re- 

, huson = 0.925, 
RMSE Giison = 2.4). 

While there are some variations on the order of a few 
kcal/mol, mean potential energy changes upon complex- 
ation are also consistent with results from Moghaddam 
et al. [33] (Tabic III). Minor discrepencies between M2 
and implicit ligand free energy and mean potential energy 
calculations may be explained by a combination of im- 
perfect sampling in the current calculations and the ap- 
proximations in M2. As the described calculations were 
performed in vacuum, the samples may not be from the 
same configurational space as those in implicit solvent. 
On the other hand, M2 assumes that the energy land- 
scape of the ligand, receptor, and complex are a trun- 
cated harmonic wells with anharmonicity corrections. 

Compared to the full procedure for estimating the 
binding PMF, applying the dominant state configuration 
leads to a reduction in the correlation with M2 results 
and an increase in the RMSE (Table IV). In contrast, 
applying the dominant state approximation to calculate 
AG° from B(rn) leads to a near-constant reduction of 
about 3 kcal/mol in the estimated binding free energy. 
While the RMSE increases, the correlation with M2 re- 
sults remains nearly identical. Given the same B(rji) 
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FIG. 2. (a) Histogram of binding PMF estimates B(r R ) 
(kcal/mol) of B02 to 100 snapshots of CB[7], using PBSA en- 
ergies. The vertical line shows the mean binding PMF for the 
minimized receptor structure, (b) and (c) Estimates of the 
binding free energy AG° of B02 to CB[7] (kcal/mol), using 
PBSA energies, as a function of the number of receptor snap- 
shots. The line and error bars denote the mean and standard 
deviation from bootstrapping: the binding free energy is es- 
timated 100 times using random selections of N out of 100 
binding PMFs. Analogous plots for the other ligands in this 
study are available as Figs. S2 and S3 in the Supplemental 
Material. 



results, however, there is essentially no reason to apply 
the dominant state approximation rather than Eq. (22). 

There is considerable variation in the binding PMFs 
for the 100 receptor structures (Fig. 2 and Fig. S2 in 
the Supplemental Material. For most of the ligands, the 
range of binding PMFs span 10 to 20 kcal/mol. While 
the binding PMF of the minimized structure is often near 
the lower end of the binding PMF distribution, this is not 
always the case. In larger ligands, the binding PMF ap- 
pears to be lower for other receptor structures. The fact 
that a single structure does not always lead to the lowest 
binding PMF shows a major limitation of using a single 
receptor structure to estimate binding free energies. 

In spite of the variability of binding PMFs, for the lig- 
ands in the test set, the average value of AG° appears to 
stabilize after a relatively small number (about 15) of re- 
ceptor snapshots (Fig. 2 and Fig. S3 in the Supplemental 
Material. Using a greater number of snapshots slightly 
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reduces the variance of binding free energy estimates. Af- 
ter the certain point, however, further reduction in the 
variance of AG° is limited by the variance in binding 
PMF estimates. 



DISCUSSION 

While the good agreement between implicit ligand and 
M2 calculations provides a proof of principle, the conver- 
gence and accuracy of implicit ligand calculations will 
differ with other classes of receptor-ligand pairs. With 
protein-ligand pairs, for example, representative sam- 
pling of receptors and finding low-energy poses of the 
ligand will likely require much more MD simulation time. 
On the other hand, many protein-ligand systems are not 
as strongly charged and may be less sensitive to the elec- 
trostatic solvation free energy. Due to these variabilities, 
assessments for the feasibility of implicit ligand calcula- 
tions in different classes of systems will prove valuable. 
Tests for convergence and accuracy may be similar to 
those performed for the CB[7] system. 

Numerous opportunities remain for further method- 
ological improvement and optimization of implicit ligand 
free energy calculations. The accuracy of implicit ligand 
calculations (and M2 calculations) may be limited by the 
quality of the force field. The decomposition of the bind- 
ing PMF in Eq. (23) provides a facile means to inte- 
grate alternate and potentially more expensive potential 
energies, e.g. quantum mechanical calculations or more 
sophisticated nonpolar solvation free energies. Modeling 
may also be improved by the inclusion of a few explicit 
water molecules (see Supplemental Material.) Another 
potential avenue for improvement is the fine-tuning of 
the replica exchange protocol (e.g. using implicit solvent 
or optimizing the number of stages and values of A for a 
particular system) or implementing alternative methods 
to estimate the binding PMF and binding free energy. 

Even without modifying the replica exchange protocol, 
computations may be accelerated by optimizing existing 
MD simulation packages for implicit ligand theory. Few 
modern MD simulation programs take full advantage of 
rigid degrees of freedom by skipping the calculation of 
pairwise interactions between rigid atoms. Even fewer 
implicit solvent models are designed with rigid receptors 
in mind [38]; implicit ligand theory may inspire the de- 
velopment of such models. 

Implicit ligand theory also provides guidance on how to 
understand and improve existing molecular docking algo- 
rithms. The definition of ^(tvrl) provides a straightfor- 
ward functional form that can be used to account for sol- 
vation free energies and ligand internal energies (strain), 
which have been noted to be important factors in bind- 
ing free energies [39], but are frequently ignored in the 
interaction energy functions used by docking packages. 
Implicit ligand theory also delineates how to improve the 



ranking of different ligands. Molecular docking packages 
currently rank receptor-ligand binding free energies based 
on a single low-energy configuration. As such, they ap- 
ply the crudest form of implicit ligand theory, the dom- 
inant state approximation, to estimate both the binding 
PMF and the binding free energy. With important mod- 
ifications to existing algorithms and the application of 
more complex estimators, the accuracy of scoring func- 
tions should be enhanced. 

One important potential change to molecular docking 
is the inclusion of multiple receptor configurations. While 
most modern docking packages account for the orienta- 
tion and flexibility of the ligand, the large number of 
coordinates makes the treatment of receptor flexibility 
challenging. A number of groups have improved docking 
performance by treating receptor flexibility by using mul- 
tiple structures from crystallography [25, 40, 41] or MD 
simulations [42] (the relaxed complex method [7, 43, 44]). 
Molecular dynamics simulations have also revealed bind- 
ing sites not discovered by crystallography [45, 46]. In 
the case of HIV integrase, insight into a new binding site 
even inspired the development of a new drug [47] . 

Despite of this success, it has hitherto remained un- 
clear how to combine information from docking to differ- 
ent receptor snapshots. While averaging strategies have 
been empirically compared [48], the default strategy has 
been to rank the ligand using the minimal energy from 
docking to all the snapshots. With implicit ligand theory, 
it is clear that the binding free energy may also be esti- 
mated by using an exponential average or cumulant ex- 
pansion of the binding PMF (which may still come from 
the dominant state approximation) for different snap- 
shots. 

The computational expense of the relaxed complex ap- 
proach may be reduced by clustering snapshots and se- 
lecting a representative snapshot from each cluster [44]. 
Assuming that the binding PMF is constant within the 
cluster, estimated averages may be weighed by the clus- 
ter size. Using a clustering algorithm based on QR fac- 
torization to select 33 representative structures, Amaro 
et al. [44] were able to accurately reproduce a histogram 
of docking scores to over 400 structures. Further research 
will be necessary to develop and validate algorithms that 
reliably cluster receptor configurations in which the bind- 
ing PMF is nearly constant. 

Compared to the inclusion of multiple receptor struc- 
tures, a more difficult task is the estimation of B(rn) us- 
ing molecular docking, as this involves a paradigm shift 
from searching for a minimum to sampling from a dis- 
tribution. Docking algorithms may be broadly classified 
into two categories: matching and docking simulation 
[49]. Matching algorithms such as DOCK [50] attempt 
to match a ligand into a model of the binding site. DOCK 
models both the ligand and the binding site as a set of 
spheres and uses algorithms from graph theory to align 
the ligand spheres into the binding site spheres. In dock- 
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ing simulation methods such as AutoDOCK [49, 51-53] 
and MCDOCK [54], the ligand starts outside the bind- 
ing site and its configuration and orientation are progres- 
sively modified to search for the lowest energy configura- 
tion of the complex. 

Matching algorithms can estimate -B(t\r) by a postpro- 
cessing algorithm. That is, after low-energy complexes 
are found, they may be used to bias receptor-independent 
random sampling of the ligand orientation by a confin- 
ing potential U c (£l), for use in Eq. (17). With a har- 
monic potential for J7 c (£l), the ligand orientation will 
come from a Gaussian distribution. An alternative post- 
processing algorithm is to use the lowest-energy struc- 
ture from a matching algorithm as a starting point for a 
rigid-receptor MD simulation. This is not prohibitively 
expensive; Graves et al. [8] even used MD simulations 
with flexibility near the binding site as a postprocessing 
step for molecular docking. Samples from this simulation 
would be used to estimate B(tb) based on Eq. (18) or 
Eq. (19). 

Docking simulation methods, on the other hand, will 
need to be modified to sample from a known distribu- 
tion rather than to search for the minimum energy. This 
change may not require a complete revamp. Docking sim- 
ulation algorithms are often based on Monte Carlo ap- 
proaches, which preserve a desired distribution or may be 
readily modified to do so. For example, MCDOCK [54] 
and early generations of AutoDOCK [51, 52] use simu- 
lated annealing, a procedure for which it is possible to 
calculate the importance sampling weight [55]. 

In addition to providing a path to rigorous binding 
free energies from molecular docking, implicit ligand the- 
ory also quantifies existing notions [56] about whether 
molecular recognition proceeds by induced fit or con- 
formational selection [57, 58]. As all receptor configu- 
rations have finite Boltzmann probability, the issue is 
a matter of degree. Suppose that a receptor binds to 
two different ligands with the same binding free energy, 
one by conformational selection and the other by in- 
duced fit. If, to a good approximation, the complex is 
dominated by a single structure with receptor config- 
uration r* R such that B{r R ) = oo for all other recep- 
tor configurations, then Eq. (11) simplifies to AG° = 
U{r* R ) + B{r* R ) + (3- 1 lnZ R + AG 6 . For the ligand that 
binds by conformational selection, p{r* R ) = e~P u ( rR > /Zr 
has a reasonably high probability. In the induced fit com- 
plex, U{r* R ) is much less favorable and B(r* R ) must com- 
pensate accordingly to achieve the same AG°. 

Source code and data used in this paper are available 
at https : / /simtk . org/home/implicit_ligand. 
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SUPPLEMENTAL MATERIAL 

This supplemental material contains one theoretical 
section, two tables, and three figures. The theoretical 
section describes a hybrid implicit-explicit solvent model. 
One table is for components of the binding PMF and the 
other for average potential energies. The figures show 
the convergence of binding PMF and binding free energy 
estimates, as well histogram of binding PMF estimates 
for different receptor snapshots. 



Hybrid Implicit-Explicit Solvent 

The desire to combine the speed of implicit solvent 
with the molecular detail and accuracy of explicit sol- 
vent has inspired interest in hybrid implicit-explicit sol- 
vent models (see Wagoner and Pande [60] and references 
therein). In the context of implicit ligand theory, a small 
number of explicit solvent molecules can be considered 
as a part of the receptor [61] during binding PMF calcu- 
lations. 

A simple formalism for a hybrid implicit-explicit sol- 
vent model may be derived by separating the coordinates 
of N solvent molecules rg into explicitly represented co- 
ordinates te and implicitly represented coordinates rj. 



Partition functions analogous to and formally equivalent 
to Eqs. (6) and (7) are then defined as, 

Z RU = J I s e-^ u( - rRL ' r ^ +w ^ rRL ' rE ^dr RL dr E (26) 
Z w = f e-P [u{rn ' rE)+w{rR < rE)] dr R dr E . (27) 



Defining the effective interaction energy as $'(r R L, te) = 
M^hl, te) — U{r Rl te) —U{ri) and the binding PMF as, 

f J I se -^(r^r E)+ U(r L)]drLd§L ^ 

B(r R ,r E ) = -P JJ^m^dT L J 

the binding free energy may be written as, 



AG° = -r 1 ln(4r RL ' G ° 



Z' R Z L 8ir* 

/ j e -P[B'{r R ,r E )+U(r R ,r E )] drRdrE ftfjo \ 
11 I J e-PU{rn,r E ) dTRdrE g^T J 

= -/?- 1 ln(e-' 3B ) +AG 5 , (28) 

\ / R,E 

where q R , E = e~^ ut ^ rR ' rE \ The main text focuses on 
describing calculations in implicit solvent, with the un- 
derstanding that explicit solvent may be readily included. 
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Ligand 


NAMD 


M2 


PB 


PBSA 


min{*(ri J L)} 


AD1 


-14.1 (0.79) 


-22.0 (0.51) 


-23.0 (0.82) 


-25.5 (0.83) 


-31.3 (0.55) 


AD 2 


-32.5 (0.15) 


-29.0 (0.13) 


-26.8 (0.12) 


-29.4 (0.12) 


-36.9 (0.30) 


AD 3 


-31.0 (0.16) 


-30.7 (0.18) 


-28.9 (0.23) 


-31.6 (0.23) 


-40.3 (0.28) 


AD 4 


-44.0 (0.94) 


-36.7 (1.11) 


-24.0 (1.12) 


-26.9 (1.12) 


-36.1 (0.45) 


AD 5 


-32.2 (0.68) 


-29.0 (0.25) 


-26.0 (0.14) 


-28.5 (0.14) 


-36.2 (0.29) 


B02 


-12.8 (0.41) 


-18.8 (0.38) 


-19.8 (0.53) 


-22.6 (0.53) 


-30.6 (0.54) 


B05 


-40.4 (0.29) 


-30.6 (0.40) 


-19.5 (0.50) 


-22.3 (0.50) 


-34.3 (0.63) 


Bll 


-52.4 (1.50) 


-38.5 (1.81) 


-14.1 (1.72) 


-17.5 (1.63) 


-39.3 (1.90) 


F01 


-1.8 (1.57) 


-5.5 (0.81) 


-10.9 (0.53) 


-13.6 (0.53) 


-24.7 (0.33) 


F02 


-14.8 (0.96) 


-14.1 (0.67) 


-16.2 (0.42) 


-19.2 (0.42) 


-31.7 (0.38) 


F03 


-16.3 (1.61) 


-13.1 (1.03) 


-16.4 (0.95) 


-19.5 (0.94) 


-31.1 (0.71) 


F06 


-30.6 (0.18) 


-20.0 (0.21) 


-21.9 (0.18) 


-25.4 (0.18) 


-37.2 (0.24) 


R-7TC 


0.884 


0.750 


0.454 


0.490 


0.883 


RMSE/tc 


12.8 


7.9 


4.9 


4.7 


12.2 


rj2 


0.827 


0.907 


0.705 


0.712 


0.792 


RMSEGiUon 


10.4 


4.8 


5.4 


4.5 


11.3 



TABLE I. The mean and standard deviation of 15 independent estimates of the binding PMF, B(rn), (kcal/mol) for various 
ligands to the minimized structure of CB[7], based on applying Eq. (23) with different force fields (NAMD, M2, PB, and PBSA 
columns) or on using the minimum observed value of the interaction energy ^(trl) from PBSA energies during the A = and 
A = 1 simulations (min{ , I' (r rl)} column). The bottom rows show the correlation coefficient (R 2 ) and root mean square error 
(RMSE, Eq. (25)) with respect to isothermal titration calorimetry experiments (ITC) and mining minima calculations (Gilson) 
from Moghaddam et al. [33] that result from the dominant state approximation - calculating AG° by using a single binding 
PMF estimate B(r R ) as an estimate for -/3' 1 In (e~ S£ %" in Eq. (11). 



Ligand 


ITC 


Gilson 


NAMD 


M2 


PB 


PBSA 


AD1 


-14.1 


-18.2 


-9.4 (0.23) 


-16.3 (0.15) 


-17.6 (0.25) 


-20.1 (0.25) 


AD2 


-19.4 


-25.9 


-27.9 (0.19) 


-24.3 (0.22) 


-22.9 (0.27) 


-25.4 (0.26) 


AD3 


-20.4 


-25.6 


-35.7 (5.03) 


-28.6 (1.87) 


-23.5 (0.23) 


-26.2 (0.23) 


AD4 


-21.5 


-29.7 


-40.5 (0.21) 


-33.7 (0.32) 


-24.3 (1.11) 


-27.1 (1.06) 


AD5 


-19.1 


-24.1 


-29.5 (1.24) 


-24.0 (0.20) 


-22.0 (0.35) 


-24.4 (0.34) 


B02 


-13.4 


-12.0 


-9.0 (0.38) 


-13.7 (0.16) 


-15.4 (0.26) 


-18.1 (0.25) 


B05 


-19.5 


-23.1 


-38.0 (0.40) 


-27.7 (0.27) 


-18.6 (0.27) 


-21.4 (0.27) 


Bll 


-20.6 


-22.4 


-51.2 (0.34) 


-37.3 (0.24) 


-17.2 (0.53) 


-20.5 (0.51) 


F01 


-12.9 


-10.2 


0.3 (0.82) 


-0.6 (0.34) 


-4.9 (0.26) 


-7.6 (0.25) 


F02 


-16.8 


-12.4 


-12.0 (0.70) 


-9.6 (0.75) 


-11.7 (0.70) 


-14.6 (0.71) 


F03 


-17.2 


-12.2 


-10.2 (0.16) 


-7.3 (0.24) 


-10.2 (0.22) 


-13.2 (0.22) 


F06 


-21.0 


-17.8 


-24.1 (0.34) 


-14.1 (0.46) 


-16.2 (0.51) 


-19.7 (0.52) 


R/TC 




0.782 


0.870 


0.745 


0.671 


0.704 


RMSE/tc 




4.6 


14.0 


9.0 


4.4 


4.5 


R 2 

^Gilson 






0.841 


0.892 


0.923 


0.925 


RMSE G « aon 






11.3 


5.9 


3.4 


2.4 



TABLE II. Estimates of the binding free energy AG° (kcal/mol) of various ligands to CB[7]. First, binding PMFs B(r R ) 
are estimated based on Eq. (23) for 100 receptor snapshots from a simulation in GBSA implicit solvent. Then AG° is 
calculated using Eq. (22). The value in the parentheses is the standard deviation from bootstrapping: the binding free energy 
is estimated based on 1000 random selections of 100 binding PMFs. The experimental and Gilson columns are isothermal 
calorimetry measurements (ITC) and M2 calculations, respectively, taken from Moghaddam et al. [33]. The bottom rows are 
the correlation coefficient (R 2 ) and root mean square error (RMSE, Eq. (25)) with respect to the ITC and Gilson columns. 



14 



Ligand 


VDW 


Coul 


PB 


Val 


NP 


Total 


AD1 


-32.5 


(0.471) 


0.1 (1.509) 


4.8 (1.547) 


-5.0 (2.785) 


-2.5 


(0.011) 


-35.2 


(2.547) 


AD2 


-33.6 


(0.931) 


-65.8 (1.032) 


64.9 (0.783) 


-5.9 (1.910) 


-2.5 


(0.017) 


-42.9 


(1.693) 


AD3 


-32.8 


(0.718) 


-64.4 (0.693) 


62.2 (0.855) 


-5.7 (2.128) 


-2.6 


(0.009) 


-43.4 


(2.388) 


AD4 


-38.1 


(1.400) 


-125.2 (3.283) 


124.4 (1.003) 


1.9 (4.475) 


-2.7 


(0.070) 


-39.9 


(2.817) 


AD5 


-33.3 


(1.374) 


-65.1 (1.549) 


64.8 (1.415) 


-4.9 (1.782) 


-2.5 


(0.025) 


-40.9 


(1.834) 


B02 


-33.3 


(0.622) 


-5.8 (1.067) 


9.7 (0.770) 


1.4 (3.379) 


-2.7 


(0.022) 


-30.6 


(2.187) 


B05 


-32.9 


(0.896) 


-138.2 (1.231) 


138.0 (1.151) 


-2.3 (1.236) 


-2.8 


(0.013) 


-38.1 


(1.673) 


Bll 


-39.9 


(1.192) 


-199.3 (2.280) 


212.0 (1.066) 


-5.6 (5.431) 


-3.4 


(0.075) 


-36.2 


(4.475) 


F01 


-26.2 


(0.497) 


-8.2 (1.824) 


14.2 (1.119) 


8.7 (4.842) 


-2.7 


(0.017) 


-14.3 


(4.079) 


F02 


-26.9 


(1.518) 


-65.7 (2.078) 


65.9 (0.923) 


-0.9 (2.237) 


-3.0 


(0.012) 


-30.6 


(2.253) 


F03 


-28.7 


(0.832) 


-58.0 (0.987) 


64.2 (0.649) 


-0.4 (3.552) 


-3.0 


(0.015) 


-26.1 


(3.428) 


F06 


-35.1 


(1.154) 


-116.1 (0.810) 


120.9 (0.651) 


-8.8 (4.862) 


-3.5 


(0.013) 


-42.6 


(4.132) 



TABLE III. Estimates of the mean potential energy changes (kcal/mol) upon the binding of various ligands to CB[7]. The 
columns refer to van der Waals (VDW), coulomb (Coul), electrostatic solvation (PB), valence (Val, bond + angle + dihedral), 
nonpolar solvation (NP), and total energies. The value in the parentheses is the standard deviation from bootstrapping: the 
observable is estimated based on 1000 random selections of 100 values of 0. In Table SII of the Supplemental Material, mean 
potential energies for the ligand, receptor, and complex are also shown. 



Ligand 




B(r R ) 




min{*(r H )} 


HREX 


HREX 


AG° 


min|_B(r H )| 


EXP 


min ji?(r fl ) 


> EXP 


AD1 


-28.6 


-27.2 


-22.0 


-20.1 


AD2 


-36.4 


-34.6 


-27.6 


-25.4 


AD3 


-38.1 


-36.8 


-27.6 


-26.2 


AD4 


-43.1 


-40.4 


-29.8 


-27.1 


AD5 


-35.8 


-33.6 


-26.8 


-24.4 


B02 


-29.8 


-27.9 


-21.0 


-18.1 


B05 


-37.9 


-35.6 


-23.7 


-21.4 


Bll 


-48.5 


-45.7 


-23.1 


-20.5 


F01 


-22.7 


-21.3 


-10.2 


-7.6 


F02 


-30.9 


-28.8 


-17.0 


-14.6 


F03 


-28.7 


-27.0 


-14.5 


-13.2 


F06 


-35.6 


-33.8 


-21.3 


-19.7 




0.849 


0.855 


0.684 


0.704 


RMSE/tc 


17.3 


15.3 


5.8 


4.5 




0.787 


0.795 


0.926 


0.925 


RMSE Gi i son 


15.8 


13.9 


3.5 


2.4 


p2 

^Exp 


0.723 


0.736 


0.996 




RMSE Exp 


15.5 


13.6 


2.3 





TABLE IV. Estimates of the binding free energy AG° (kcal/mol) using the PBSA model. First, the binding PMF B(r R ) 
is estimated with the dominant state approximation (min {^(r^)}) or based on Eq. (23) (HREX). Then, AG° is from the 
dominant state approximation (min ^B(tr) j) or based on Eq. (22) (EXP). The bottom rows show the correlation coefficient 

(R 2 ) and root mean square error (RMSE, Eq. (25)) with respect to isothermal titration calorimetry experiments (ITC) and 
mining minima calculations (Gilson) from Moghaddam et al. [33], and the fourth column. 



Ligand 



AD1 

AD2 

AD3 

AD4 

AD 5 

B02 

B05 

Bll 

F01 

F02 

F03 

F06 

Net formal charge 



Charge 3 



B, 



cpl 



-31.6 (0.09) 
-91.3 (0.12) 
-93.2 (0.17) 
■148.0 (0.41) 
-90.5 (0.13) 
-31.9 (0.09) 
■156.8 (0.13) 
■220.7 (0.83) 
-25.2 (0.27) 
-84.2 (0.24) 
-82.1 (0.51) 
■144.6 (0.15) 



B 



RL.NAMD 



Br 



-115.9 (0.78) 
-123.7 (0.10) 
-118.9 (0.06) 
-202.5 (0.62) 
-124.3 (0.76) 
-118.4 (0.37) 
-187.3 (0.27) 
-437.9 (1.24) 
-117.2 (1.50) 
-112.1 (0.87) 
-111.8 (2.04) 
-152.3 (0.09) 



-2.8 (0.01) 
-51.9 (0.04) 
-50.5 (0.04) 
■175.9 (0.63) 
-52.1 (0.05) 

-6.9 (0.05) 
■173.2 (0.11) 
■475.7 (0.65) 
-10.1 (0.10) 
-50.9 (0.07) 
-47.0 (0.04) 
■135.7 (0.10) 



Br 



B L 



-111.5 (0.50) 
-112.8 (0.04) 
-108.9 (0.05) 
-192.2 (0.75) 
-111.9 (0.29) 
-111.4 (0.33) 
-176.4 (0.35) 
-422.5 (1.20) 
-65.0 (0.75) 
-77.9 (0.57) 
-76.2 (1.45) 
-123.3 (0.12) 



-1.1 (0.01) 
-55.2 (0.04) 
-51.4 (0.04) 
■183.5 (0.58) 
-53.5 (0.02) 

-4.5 (0.03) 
■182.7 (0.18) 
■484.7 (0.97) 

35.3 (0.07) 
-28.0 (0.11) 
-25.2 (0.07) 
■127.9 (0.18) 



B f 



B L 



-128.1 (0.81) 
-123.4 (0.02) 
-122.3 (0.10) 
-188.8 (0.87) 
-123.5 (0.03) 
-129.1 (0.49) 
-173.4 (0.45) 
-404.2 (1.93) 
-92.5 (0.45) 
-96.6 (0.32) 
-97.0 (1.37) 
-139.0 (0.09) 



-4.5 (0.01) 
-55.8 (0.03) 
-54.5 (0.04) 
-180.6 (0.54) 
-55.9 (0.03) 

-9.0 (0.02) 
-178.6 (0.15) 
-478.7 (0.74) 

25.4 (0.07) 
-32.5 (0.08) 
-30.5 (0.05) 
-129.5 (0.14) 



B 



RL,PBSA 



Br 



-122.3 (0.81) 
-117.6 (0.02) 
-116.4 (0.10) 
-182.8 (0.86) 
-117.6 (0.03) 
-123.3 (0.49) 
-167.5 (0.45) 
-396.9 (1.80) 
-86.6 (0.46) 
-90.7 (0.32) 
-91.1 (1.36) 
-132.9 (0.09) 



-2.6 (0.01) 
-53.8 (0.03) 
-52.3 (0.04) 
-178.1 (0.54) 
-53.9 (0.03) 

-6.8 (0.02) 
-176.4 (0.15) 
-474.3 (0.75) 

27.6 (0.08) 
-29.9 (0.08) 
-27.9 (0.05) 
-126.3 (0.14) 



TABLE SI. The mean and standard deviation of 15 independent estimates for components of the binding PMF B(rn) (kcal/mol), as described in Eq. (23), for various 
ligands to the minimized structure of Cu[7]. 
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Average Potential Energy of Complexes 



Ligand 


VDW 


Com 


PB 


Val 




T\TT) 
JNF 


Total 


AD1 


-90.2 (0.417) 


50.7 (1.415) 


-131.6 


(1.500) 


328.2 


(2.651) 


5.8 


(0.011) 


162.8 


(2.400) 


AD2 


-91.9 (0.905) 


29.7 (0.888) 


-122.9 


(0.687) 


328.5 


(1.710) 


5.8 


(0.017) 


149.2 


(1.464) 


AD3 


-91.7 (0.684) 


23.7 (0.452) 


-124.3 


(0.768) 


333.8 


(1.950) 


5.8 


(0.009) 


147.4 


(2.231) 


AD4 


-98.1 (1.382) 


79.7 (3.240) 


-191.4 


(0.930) 


348.1 


(4.394) 


6.1 


(0.070) 


144.4 


(2.685) 


AD5 


-91.5 (1.357) 


19.7 (1.457) 


-123.1 


(1.364) 


331.6 


(1.565) 


5.8 


(0.025) 


142.5 


(1.624) 


B02 


-87.2 (0.582) 


39.8 (0.929) 


-131.2 


(0.672) 


340.6 


(3.270) 


5.8 


(0.021) 


167.8 


(2.014) 


B05 


-90.5 (0.869) 


29.4 (1.113) 


-173.0 


(1.087) 


340.4 


(0.896) 


5.8 


(0.012) 


112.1 


(1.440) 


Ull 


-lUo.U (1.1 f 1) 


QQ(^ n f'O oi o^ 

oou.y (z./ioj 


-404.0 


(0.997) 


379.8 


(5.364) 


— •) 
i ■■> 


(U.UIO) 


217.0 


(4.393) 


F01 


-88.1 (0.446) 


84.0 (1.747) 


-127.1 


(1.054) 


515.7 


(4.767) 


5.8 


(0.016) 


390.4 


(3.989) 


F02 


-89.4 (1.502) 


56.6 (2.010) 


-119.3 


(0.843) 


514.2 


(2.068) 


5.9 


(0.012) 


368.1 


(2.086) 


F03 


-90.1 (0.802) 


55.2 (0.835) 


-117.4 


(0.529) 


518.1 


(3.448) 


5.9 


(0.015) 


371.7 


(3.320) 


F06 


-97.3 (1.133) 


63.0 (0.616) 


-153.2 


(0.532) 


525.4 


(4.787) 


6.1 


(0.012) 


343.9 


(4.044) 



Ligand 



Average Potential Enerj 
VDW Coul 



y of Ligands 
PB 



Val 



NP 



Total 



AD1 

AD2 

AD3 

AD4 

AD5 

B02 

B05 

Bll 

F01 

F02 

F03 

F06 



-1.8 (0.0063) 
-2.5 (0.0049) 
-3.0 (0.0058) 



-6.0 (0.0036) 
-6.5 (0.0055) 
-5.5 (0.0078) 



-8.5 (0.0012) 
36.3 (0.0036) 

29.0 (0.0028) 
-4.1 (0.0056) 145.8 (0.0124) 
-2.4 (0.0052) 25.6 (0.0032) 
1.9 (0.0103) -13.6 (0.0032) 
-1.7 (0.0069) 108.3 (0.0081) 
-7.2 (0.0116) 477.0 (0.0132) 

33.1 (0.0059) 
63.1 (0.0069) 
54.1 (0.0069) 



-6.4 (0.0101) 120.0 (0.0109) 



-4.6 (0.0014) 
-55.9 (0.0028) 
-54.6 (0.0028) 
-183.9 (0.0089) 
-56.1 (0.0030) 

-9.1 (0.0025) 
-179.2 (0.0055) 
-484.1 (0.0106) 

-9.5 (0.0036) 
-53.4 (0.0037) 
-49.8 (0.0035) 
-142.3 (0.0064) 



52.6 (0.0232) 2.0 
53.8 (0.0246) 2.0 

58.8 (0.0259) 2.1 

65.6 (0.0291) 2.5 

55.9 (0.0245) 2.0 

58.7 (0.0257) 2.2 
62.1 (0.0270) 2.3 
104.8 (0.0268) 4.4 

226.5 (0.0251) 2.2 

234.6 (0.0268) 2.6 
238.0 (0.0286) 2.6 
253.6 (0.0341) 3.3 



;o.oooi 
;o.oooi 
;o.oooi 
;o.oooi 
;o.oooi 
;o.oooi 
;o.oooi 
;o.ooo2 
;o.oooi 
;o.ooo2 
;o.oooi 

0.0002 



39.7 (0.0236) 

33.7 (0.0248) 
32.4 (0.0264) 
25.9 (0.0270) 

25.0 (0.0248) 

40.1 (0.0255) 
-8.2 (0.0273) 

94.8 (0.0232) 
246.4 (0.0258) 
240.4 (0.0270) 
239.4 (0.0284) 
228.1 (0.0339) 



Average Potential Energy of the Receptor 
VDW Coul PB Val 



NP 



Total 



-55.9 (0.2198) 59.2 (0.5253) -131.8 (0.3756) 280.6 (0.8513) 6.3 (0.0039) 158.4 (0.8516) 
Average Potential Energy Changes 



Ligand 


VDW 


Coul 


PB 


Val 


NP 


Total 


AD1 


-32.5 


(0.471) 


0.1 (1.509) 


4.8 (1.547) 


-5.0 (2.785) 


-2.5 


(0.011) 


-35.2 


(2.547) 


AD2 


-33.6 


(0.931) 


-65.8 (1.032) 


64.9 (0.783) 


-5.9 (1.910) 


-2.5 


(0.017) 


-42.9 


(1.693) 


AD3 


-32.8 


(0.718) 


-64.4 (0.693) 


62.2 (0.855) 


-5.7 (2.128) 


-2.6 


(0.009) 


-43.4 


(2.388) 


AD4 


-38.1 


(1.400) 


-125.2 (3.283) 


124.4 (1.003) 


1.9 (4.475) 


-2.7 


(0.070) 


-39.9 


(2.817) 


AD5 


-33.3 


(1.374) 


-65.1 (1.549) 


64.8 (1.415) 


-4.9 (1.782) 


-2.5 


(0.025) 


-40.9 


(1.834) 


B02 


-33.3 


(0.622) 


-5.8 (1.067) 


9.7 (0.770) 


1.4 (3.379) 


-2.7 


(0.022) 


-30.6 


(2.187) 


B05 


-32.9 


(0.896) 


-138.2 (1.231) 


138.0 (1.151) 


-2.3 (1.236) 


-2.8 


(0.013) 


-38.1 


(1.673) 


Bll 


-39.9 


(1.192) 


-199.3 (2.280) 


212.0 (1.066) 


-5.6 (5.431) 


-3.4 


(0.075) 


-36.2 


(4.475) 


F01 


-26.2 


(0.497) 


-8.2 (1.824) 


14.2 (1.119) 


8.7 (4.842) 


-2.7 


(0.017) 


-14.3 


(4.079) 


F02 


-26.9 


(1.518) 


-65.7 (2.078) 


65.9 (0.923) 


-0.9 (2.237) 


-3.0 


(0.012) 


-30.6 


(2.253) 


F03 


-28.7 


(0.832) 


-58.0 (0.987) 


64.2 (0.649) 


-0.4 (3.552) 


-3.0 


(0.015) 


-26.1 


(3.428) 


F06 


-35.1 


(1.154) 


-116.1 (0.810) 


120.9 (0.651) 


-8.8 (4.862) 


-3.5 


(0.013) 


-42.6 


(4.132) 



TABLE SII. Estimates of the mean potential energy 
(kcal/mol) of the ligand, receptor, and complex for different 
Cu[7] ligands. The columns refer to van der Waals (VDW), 
coulomb (Coul), electrostatic solvation (PB), valence (Val, 
bond + angle + dihedral), nonpolar solvation (NP), and total 
energies. The value in the parentheses is the standard devi- 
ation from bootstrapping: the observable is estimated based 
on 1000 random selections of 100 values of B. 



17 



-21.5 

3= -25 
co 

-28.5 
-30.3 
f -31 
-31.7 
-116.5 
-120 
-123.5 



CD 



CO 





-2.5 


_l 




CD 






-2.7 




2.8 


"""111 


-26.5 


CC 




E 


-30 






E 


-33.5 



AD1 



fflte i 1 1 1 1 i ^ i 1 1 1 1 1 1 1 



0.5 1 1.5 2 

Total Simulation Time (ns) 





-29.3 


CC 

1 


-30 


CO 






-30.7 




-90.3 


~Cl 

o 

CD 


-91 




-91.7 




117.2 


_i 




^117.5 




117.9 




-53.6 


_i 




CD 


-53.8 




-54 


' lj 


-34.6 


CC 




E 


-36 


E 


-37.4 



pE-f-H-^ i 1 1 1 1 1 - 



f ffi 1 1 1 1 1 1 1 1 



0.5 1 1.5 2 

Total Simulation Time (ns) 



CO 



CO 



CO 



-30.6 
-32 
-33.4 
-92.3 
-93 
-93.7 
-116.2 
fC-116.5 



co - 



116.9 
-52.2 
52.5 



E" 
E 



-52.9 
-38.6 

-40 

-41.4 



AD3 



il 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



-j^HHH iiiiiiii iiiiiii:eii i 



IIIIIIIIi 



0.5 1 1.5 

Total Simulation Time (ns) 





-21.5 


CC 

1 

CO 


-25 
-28.5 




145.8 


Q. 

u. 

CO 


147.5 




149.3 




176.5 


_l 
CC 
CO 


-180 




183.5 




175.8 


_l 
CO - 


177.5 




179.3 


' lj 
CC 


-31.5 


1 

E 


-35 


E 


-38.5 







AD4 . 


























Illllininiir 




ttfiilliii-llliiilili. 












BtfHIHIIIII 




h i i i i i i i i i i-i i i i i i i i - 





0.5 1 1.5 

Total Simulation Time (ns) 



FIG. SI. (a) The mean and standard deviation of 15 indepen- 
dent estimates of B(rn), B cp i, Brl, Bl, and min-f^rR^)} 
(kcal/mol) based on PBS A energies as a function of total MD 
simulation time. 
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FIG. SI. (b) The mean and standard deviation of 15 indepen- 
dent estimates of B(rn), B cp i, Brl, Bl, and min{>t(rR.L)} 
(kcal/mol) based on PBS A energies as a function of total MD 
simulation time. 
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FIG. SI. (c) The mean and standard deviation of 15 indepen- 
dent estimates of B(rn), B cp i, Brl, Bl, and min-f^rR^)} 
(kcal/mol) based on PBS A energies as a function of total MD 
simulation time. 
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FIG. S3. Estimates of the binding free energy AG° of various 
ligands to Cu[7] (kcal/mol), using PBS A energies, as a func- 
tion of the number of receptor snapshots. The line and error 
bars denote the mean and standard deviation from bootstrap- 
ping: the binding free energy is estimated 100 times using 
random selections of N out of 100 binding PMFs. 



